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Abstract 

This  paper  outlines  a  method  to  automatically  de¬ 
tect  targets  from  sets  of  pixel-registered  visual,  ther¬ 
mal,  and  range  images.  The  method  uses  operations 
specifically  designed  for  the  different  kinds  of  images. 
It  also  introduces  the  morphological  operation  called 
“erosion  of  strength  n”  as  a  powerful  tool  for  removal 
of  spurious  information.  Good  preliminary  results  ob¬ 
tained  for  detection  support  its  suitability  for  appli¬ 
cation  to  the  Automatic  Target  Recognition  (ATR) 
problem. 

Introduction 

By  using  multiple  images  from  different  sensors  to 
detect  and  recognize  targets,  we  can  take  advantage 
of  the  specific  characteristics  of  the  sensors  and  cor¬ 
responding  images  and  combine  them  to  raise  detec¬ 
tion  and/or  recognition  rates.  A  very  good,  yet  brief, 
presentation  of  the  different  sensors  used  on  ATR  is 
given  by  Bhanu  and  Jones  [1].  Our  approach  here  is 
to  define  three  basically  different  kinds  of  images  ac¬ 
cording  to  what  they  represent,  without  concern  for 
the  specific  method  and/or  sensor  used  to  produce 
them: 

Visual:  Images  that  represent  the  intensity  of  the 
light  emitted  or  reflected  by  bodies,  within  the  visi¬ 
ble  band  of  the  spectrum.  A  regular  photograph  is 
the  typical  example  of  this  kind. 

Thermal:  Images  whose  pixel  values  represent  a 
measure  of  the  temperature  at  a  specific  location.  Ac¬ 
tually,  they  represent  the  intensity  of  light  emitted 
or  reflected  by  bodies,  but  inside  a  certain  infrared 
region  of  the  spectrum.  Under  certain  conditions, 
the  intensity  obtained  from  an  infrared  [8-12/xm]  sen¬ 
sor  is  precisely  related  to  the  exact  temperature  by  a 
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simple  line  equation  [2]. 

Range:  Images  whose  pixel  values  represent  a  mea¬ 
sure  of  the  distance  from  the  objects  to  the  sensor. 
On  top- view  aerial  images,  these  images  can  repre¬ 
sent  elevation  of  terrain  or  objects. 

The  methods  to  produce  the  images  can  be  very  di¬ 
verse:  The  sensors  can  be  active  or  passive,  they  may 
use  a  given  specific  band  or  another.  A  given  kind  of 
image  can  be  produced  by  different  methods,  but  the 
resultant  images  are  of  the  same  nature,  and  so,  can 
be  operated  on  by  algorithms  defined  for  the  specific 
kind  of  image.  We  now  describe  a  method  to  perform 
target  detection  from  sets  of  three  pixel-registered  im¬ 
ages  (visual,  thermal,  and  range)  for  a  given  scene. 

Detection  Algorithm 

Figure  1  presents  the  general  scheme  for  the  detec¬ 
tion  of  targets  from  visual-thermal-range  image  sets. 
The  system  is  designed  to  operate  on  top-view  im¬ 
ages  with  pixels  represented  by  bytes  (0  to  255).  On 
the  visual  images,  higher  values  represent  brighter 
points.  On  the  thermal  images,  higher  values  repre¬ 
sent  warmer  points.  The  range  images  follow  a  format 
in  which  one-level  increments  correspond  to  changes 
of  10  cm  in  elevation.  The  resolution  for  the  images  is 
25  cm  per  pixel,  and  the  targets  have  rectangular  to 
elliptical  shapes,  with  an  area  of  150  to  2000  pixels. 
The  different  blocks  of  figure  1  are  described  next. 

Bright/Dark  point  Extractor 

The  bright /dark-point  extractor  is  used  on  both  vi¬ 
sual  images  and  thermal  images.  It  extracts  points 
that  are  either  darker  or  brighter  than  their  sur¬ 
roundings  in  visual  images,  and  points  that  are  either 
warmer  or  colder  than  their  surroundings  in  thermal 
images.  Our  method  is  similar  to  that  of  Nahm  [3], 
with  some  modifications.  Around  a  given  pixel,  it 
makes  a  rectangular  annular  window,  one  pixel  wide, 
and  estimates  its  mean  (jlij  and  its  standard  devia- 
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Figure  1:  The  detection  algorithm 


tion  aij.  Then,  to  determine  the  possibility  that  the 
pixel  is  part  of  a  target,  it  checks  whether  its  value 
Xij  differs  from  the  average  of  the  annular  window 
by  more  than  That  is, 

if  {xij  -  Hij)  >  or  {xij  -  Hij)  < 

then  assign  point  (i^j)  as  a  possible  target. 

Texture  Extractor 

The  texture  extractor  operates  on  visual  images.  It 
measures  the  degree  of  similarity  between  adjacent 
pixels,  for  both  the  point  under  study  (z,y)  (presum¬ 
ably  a  target)  and  the  pixels  on  an  annular  window 
around  it  (presumably  clutter),  and  then  compares 
them  to  see  if  they  differ  by  more  than  a  specified 
amount.  As  in  the  bright /dark-point  extractor,  we 
calculate  a  mean  and  a  standard  deviation  aij 
for  the  annular  window,  but  of  the  absolute  difference 
between  adjacent  pixels,  rather  than  of  their  inten¬ 
sity.  Also,  we  calculate  Xij,  the  average  difference 
between  point  {i,j)  and  its  four  adjacent  points.  So 
our  test  to  determine  a  target  point  is  similar  to  the 
one  used  for  the  the  bright /dark-point  extractor. 

Planar  Region  Extractor 

Since  targets  are  well  modeled  by  a  collection  of  pla¬ 
nar  regions,  the  use  of  the  degree  of  planarity  to  de¬ 
termine  possible  targets  has  been  proposed  [4].  A 
target  usually  has  smooth  (planar  for  small  regions) 
surfaces,  compared  to  most  forms  of  clutters  (grasses, 
trees,  ground).  The  planar  region  extractor  examines 
3x3  pixel  regions  from  the  range  images,  and  obtains 
an  error  e  with  respect  to  the  equation  of  a  plane 


=  ax-\-by^ po^  Then  it  uses  a  threshold  cth  =  0.6, 
so  a  pixel  is  defined  as  a  target  if  e  <  cth- 

Predefined  Elevation  Extractor 

If  we  have  a  basic  knowledge  of  the  kind  of  targets  to 
search  for  (in  our  case,  tanks) ,  we  can  easily  check  if  a 
point  under  study  has  an  elevation  suggesting  a  pos¬ 
sible  target.  For  this,  we  calculate  /Xij,  the  average 
elevation  of  a  surface  in  an  annular  window  around  a 
point  and  then  compare  it  with  Xijy  the  eleva¬ 
tion  of  the  point  (iyj): 

if  80cm  <  (xij  “  pij)  <  250  cm,  then  assign  point 
(iyj)  as  a  possible  target. 

Spurious  Region  Cleaner 

After  (or  inside,  when  possible)  the  operators  men¬ 
tioned  above,  a  downsampling  of  4:1  is  done,  which 
reduces  complexity,  maintaining  most  of  the  detec¬ 
tion  information.  Thus,  we  have  smaller  binary  im¬ 
ages  (1:  target,  0:  no  target),  which  have  a  series  of 
regions  that  do  not  yet  give  a  correct  target  detection. 
The  problem  is  that  there  are  many  isolated  single¬ 
pixel  spots,  or  some  spots  that  definitely  do  not  have 
the  shape  of  a  target.  Also,  there  are  some  spots  that 
can  be  recognized  by  eye  as  targets,  but  which  have 
many  “holes’^  (pixels  with  value  0)  in  them.  To  re¬ 
move  this  “noise,”  we  use  a  series  of  morphological 
operations  that  are  specifically  designed  for  this  pur¬ 
pose.  We  propose  an  erosion  operator  eros-n(im,n)y 
“erosion  of  strength  n”  which  works  as  follows:  a  3  x  3 
template  is  passed  over  the  binary  image  im.  Around 
each  pixel  (i,i),  the  number  of  I’s  is  counted.  If  it 


Table  1:  Detection  results;  (^/)  detected,  (x)  miss. 
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is  larger  than  n,  the  output  for  the  pixel  {ij)  is  1, 
otherwise,  it  is  0.  This  operator  can  be  used  in  series, 
with  different  values  of  n,  and  gives  excellent  results. 
This  powerful  but  simple  operator  can  be  stated  with 
MATLAB  code  as  follows: 

b==[l  1  1;  1  1  1;  1  1  1]; 
imOUT=conv2(imIN,  b,  ’same'); 
imOUT:^  (imOUT  >==  n  ); 

When  n  =  9,  it  degenerates  into  the  classical  ero¬ 
sion  operator.  By  using  n  <  9,  we  keep  points  on 
the  input  image  that  are  important,  but  that  would 
be  eliminated  with  other  erosion  methods.  Note  also 
that  the  operator  is  independent  of  shape.  An  analo¬ 
gous  “dilation  of  strength  n”  can  also  be  defined.  The 
spurious  region  cleaning  (SRC)  operator  is  defined  as 
follows: 

irriout  —  dilate{ero$-n{eros-n{iminyni)yn2) 

The  application  of  two  erosion  operators  in  series  re¬ 
sults  in  great  performance  on  eliminating  spurious 
target  pixels,  for  different  densities  and  target-to- 
clutter  contrasts.  The  parameters  ni  and  712  are 
toned  to  specifically  work  on  the  different  images. 
The  dilate  operator  is  used  to  join  together  points 
that  are  likely  to  belong  to  the  same  target.  The 
output  gives  small  regions  inside  the  likely  targets, 
usually  with  very  few  false  alarms,  for  the  different 
binary  images. 

Majority  Decision 

The  final  function  of  the  detector  is  to  combine  the 
results  of  the  individual  detectors  to  produce  the  final 
output.  The  method  we  use  checks  the  five  detectors, 
and  for  each  pixel  it  assigns  a  1  if  there  are  three  or 
more  Is  as  inputs,  and  assigns  a  0  otherwise.  If  a 
cluster  of  Is  overlaps  a  target,  the  target  is  declared 
detected^  otherwise  it  is  declared  a  miss,  A  cluster  not 
overlapping  a  target  is  declared  a  false  alarm  (FA). 

Test  Data  and  Results 

We  analyzed  three  different  sets  of  images  (each  con¬ 
sisting  of  a  visual,  a  thermal,  and  a  range  image). 


representing  three  scenes.  The  first  scene  has  two 
tanks,  on  a  dry  area,  without  vegetation.  The  sec¬ 
ond  scene  has  three  tanks,  including  one  partially  oc¬ 
cluded  by  vegetation.  The  third  scene  has  also  three 
tanks  including  one  partially  occluded  by  vegetation, 
and  it  has  several  pieces  of  cultural  clutter,  such  as 
small  buildings,  bridges,  etc.  The  last  two  scenes  have 
bodies  of  water  as  well.  The  images,  512x512  pixels, 
are  artificial,  but  were  synthesized  with  information 
from  real  visual  images. 

The  generation  process  was  as  follows:  Visual  back¬ 
grounds  were  taken  from  selected  aerial  photographs. 
These  images  were  clipped  and  scaled  to  match  our 
objectives.  Then,  we  embedded  visual  images  of 
tanks  on  the  images,  with  the  use  of  interactive  pro¬ 
grams.  The  location  and  orientation  of  the  targets 
were  chosen  to  resemble  a  real  scene  as  closely  as 
possible.  Then  thermal  images  were  first  generated 
with  the  use  of  interactive  tools  to  define  temperature 
values  for  every  part  of  the  images,  and  then  post- 
processed  with  the  use  of  filtering,  interpolation,  and 
the  addition  of  spatially  correlated  random  data.  Fi¬ 
nally,  the  range  images  were  synthesized  in  a  similar 
way,  incorporating  not  only  the  elevation  data,  but 
the  random  height  variability  of  the  different  surfaces 
that  composed  the  scene,  as  would  result  from  sub¬ 
pixel  information. 

As  seen  in  table  1,  some  targets  could  not  be  detected 
from  individual  images,  but  most  were  correctly  de¬ 
tected  by  our  integrated  system.  Of  the  total  of  eight 
targets,  seven  were  correctly  detected,  and  only  eight 
false  alarms  were  produced,  most  of  them  from  cul¬ 
tural  clutter  in  the  third  scene.  The  number  of  false 
alarms  per  scene  was  at  most  as  large  as  for  the  best 
individual  detector.  The  only  miss  corresponds  to  a 
partially  occluded  target.  We  believe  that  our  system 
would  perform  even  better  on  noiser  thermal  images. 
We  plan  to  apply  our  system  to  additional  real  scenes 
when  data  becomes  available,  to  see  its  actual  perfor- 


Figure  2:  Detection  process  for  scene  2.  (a-c)  Original  visual,  thermal,  and  range  images,  (d-h)  show  the 
detection  before  SRC  (left)  and  after  SRC  (right),  as  they  result  from;  (d)  texture  extractor  on  visual  image, 
(e)  bright/dark  extractor  on  visual  image,  (f)  bright/dark  extractor  on  thermal  image,  (g)  predefined  elevation 
extractor  on  range  image,  (h)  planar  region  extractor  on  range  image,  (i)  overall  detection  results. 
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